LSTM Autoencoders for Dialect Analysis

نویسندگان

Taraka Rama

Çagri Çöltekin

چکیده

Computational approaches for dialectometry employed Levenshtein distance to compute an aggregate similarity between two dialects belonging to a single language group. In this paper, we apply a sequence-to-sequence autoencoder to learn a deep representation for words that can be used for meaningful comparison across dialects. In contrast to the alignment-based methods, our method does not require explicit alignments. We apply our architectures to three different datasets and show that the learned representations indicate highly similar results with the analyses based on Levenshtein distance and capture the traditional dialectal differences shown by dialectologists.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Recent work on generative text modeling has found that variational autoencoders (VAE) with LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for...

متن کامل

A deep learning framework for financial time series using stacked autoencoders and long-short term memory

The application of deep learning approaches to finance has received a great deal of attention from both investors and researchers. This study presents a novel deep learning framework where wavelet transforms (WT), stacked autoencoders (SAEs) and long-short term memory (LSTM) are combined for stock price forecasting. The SAEs for hierarchically extracted deep features is introduced into stock pr...

متن کامل

Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders

A neural network model that significant improves unitselection-based Text-To-Speech synthesis is presented. The model employs a sequence-to-sequence LSTM-based autoencoder that compresses the acoustic and linguistic features of each unit to a fixed-size vector referred to as an embedding. Unit-selection is facilitated by formulating the target cost as an L2 distance in the embedding space. In o...

متن کامل

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval. Segmentation entails breaking words into their constituent stems, affixes and clitics. In this paper, we compare two approaches for segmenting four major Arabic dialects using only several thousand training examples for each dialect. The two approaches involve posing th...

متن کامل

Player Goal Recognition in Open-World Digital Games with Long Short-Term Memory Networks

Recent years have seen a growing interest in player modeling for digital games. Goal recognition, which aims to accurately recognize players’ goals from observations of low-level player actions, is a key problem in player modeling. However, player goal recognition poses significant challenges because of the inherent complexity and uncertainty pervading gameplay. In this paper, we formulate play...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

LSTM Autoencoders for Dialect Analysis

نویسندگان

چکیده

منابع مشابه

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

A deep learning framework for financial time series using stacked autoencoders and long-short term memory

Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

Player Goal Recognition in Open-World Digital Games with Long Short-Term Memory Networks

عنوان ژورنال:

اشتراک گذاری